Regular Expressions

Regular expressions are a very powerful way to match arbitrary text. The regular expression engine attempts to match the regular expression against the input string. Such matching starts at the beginning of the string and moves from left to right. The matching is considered to be "greedy", because at any given point, it will always match the longest possible substring. For example, if a regular expression could match the substring 'aa' or 'aaa', it will always take the longer option.

Meta Characters

Most characters in a regular expression are "ordinary", which indicates that they have no special meaning and only match themselves. Certain characters, sometimes called "meta characters", have special meanings. To use a meta character as an ordinary character, you need to "escape" it by preceding it with a backslash character (for example, "\*").

The meta characters are described in the following table:

Character

Description

.

The period matches any character.

[ ]

The open bracket character indicates a "bracket expression", discussed below. The close bracket character terminates such an expression.

\

The backslash suppresses the special meaning of the character it precedes, and turns it into an ordinary character. To insert a backslash into your regular expression pattern, use a double backslash ('\\').

( )

The open parenthesis indicates a "subexpression", discussed below. The close parenthesis character terminates such a subexpression.

*

Zero or more of the character or expression to the left. Hence, 'a*' means zero or more instances of 'a' .

+

One or more of the character or expression to the left. Hence, 'a+' means one or more instances of 'a'.

?

Zero or one of the character or expression to the left. Hence, 'a?' will match 'a' or the empty string ' '.

{}

An interval qualifier allows you to specify exactly how many instances of the character or expression to the left to match. For example, 'a{3}' will match 'aaa'. You can also specify two integers separated by a comma to specify a range of repetitions. For example, 'a{2,4}' will match 'aa', 'aaa', or 'aaaa'. Note that '{0,1}' is equivalent to '?'.

|

Alternation. This operator indicates that one of several possible choices can match. For example, '(a|b|c)z' will match any of 'az', 'bz', or 'cz'.

^ $

Anchors. A '^' matches the beginning of a string, and '$' matches the end. For example, '^abc' will only match strings that start with the string 'abc'. '^abc$' will only match a string containing only 'abc'.

Subexpressions

Subexpressions are those parts of a regular expression enclosed in parentheses. There are two reasons to use subexpressions:

Bracket Expressions

Bracket expressions (expressions enclosed in square brackets) are used to specify a set of characters that can satisfy a match. Many of the meta characters described above (.*[\) lose their special meaning within a bracket expression. The right bracket loses its special meaning if it occurs as the first character in the expression (after an initial '^', if any).

There are several different forms of bracket expressions, including:

Special Characters

Special (non-printing) characters are often represented in regular expressions using backslash escape codes, such as \t to represent a TAB character or \n to represent a newline character. IDL does not support these backslash codes in regular expressions. Instead, you can use the ASCII value to represent these characters:

ASCII Character

Byte Value

Bell

7b

Backspace

8b

Horizontal Tab

9b

Linefeed

10b

Vertical Tab

11b

Formfeed

12b

Carriage Return

13b

Escape

27b

For example, to represent the TAB character, use the expression STRING(9B).

This syntax can be used when comparing strings or performing regular expression matching. For example, to find the position of the first TAB character in a string:

result = string.Split(STRING(9b))

where string is a variable containing the string to be searched.